Fix HIP memory, graph, and stream handling.#120
Conversation
AWoloszyn
commented
Jun 18, 2026
- Adds HRX-backed HIP pooled allocation support.
- Fixes graph capture/launch semantics and graph dependency validation.
- Implements stable HIP stream IDs and serialized stream recording.
- Tightens stream/context teardown so pending work is synchronized before handles are detached.
- Removes stale HIP-local memory-pool helper code after rebasing onto the HRX pool implementation.
- Preserves HRX pool buffer ownership for pooled allocations.
- Adds overflow/range checks for stream lists, graph memcpy/memset ranges, graph-exec memset sizes, synthetic pointer alignment, and pitched allocations.
- Cleans up context failure paths so partially-created contexts and default memory pools are released consistently.
| reachable_nodes, stack); | ||
| } | ||
| if (!iree_status_is_ok(status)) { | ||
| iree_hal_streaming_graph_deinitialize_additional_edge_index( |
There was a problem hiding this comment.
I think there is a code path where initialize_additional_edge_index calls the deinitialize in an error path and returns an error status, so this additional deinitialize would trigger a double free. Maybe remove the deinit that occurs inside the init func error path since it's handled here.
| HIP_RETURN_ERROR(hipErrorInvalidValue); | ||
| } | ||
|
|
||
| iree_hal_streaming_graph_node_t* node = NULL; |
There was a problem hiding this comment.
Why is this a NULL? Is the type enough or something?
There was a problem hiding this comment.
Yeah, I guess I just don't understand where the event is going here.
There was a problem hiding this comment.
Good catch. Type alone was not enough. Event graph nodes now store and retain the event in the node attrs, and graph exec uses that retained event to
signal/wait. The NULL is now only “no extra trailing node data.”
| } | ||
|
|
||
| iree_hip_per_thread_stream_context = context; | ||
| iree_hip_per_thread_stream = stream; |
There was a problem hiding this comment.
I'm a bit confused on how these get cleaned up when the thread exits. I only see the clear function called earlier in this function body, and again during device reset.
There was a problem hiding this comment.
Addressed with binding-local TLS. The per-thread stream state has a destructor, so thread exit releases the cached stream and frees the small per-thread
state object. Explicit clears still handle device reset/context replacement.
| return true; | ||
| } | ||
|
|
||
| static bool iree_hip_context_invalidate_visible_captures( |
There was a problem hiding this comment.
This is used in a lot of places, and seems expensive if there isn't any capturing going on. E.g., every hipMalloc needs to snapshot all streams just to see if any of them are capturing?
Is there like a bool we can early return on here if no captures are going on?
| iree_hal_streaming_stream_t* copy_stream = | ||
| stream ? stream : context->default_stream; | ||
| iree_hal_streaming_buffer_t* staging = NULL; | ||
| iree_slim_mutex_lock(&context->mutex); |
There was a problem hiding this comment.
Why the context mutex and not the copy_stream->mutex?
There was a problem hiding this comment.
The context mutex is for the shared pageable H2D staging buffer, not for stream submission. Stream command-buffer mutation is separately protected by
copy_stream->mutex.
|
|
||
| // Record copy command. | ||
| iree_hal_streaming_stream_t* copy_stream = | ||
| stream ? stream : context->default_stream; |
There was a problem hiding this comment.
Does D2H not require any mutexes?
There was a problem hiding this comment.
The direct D2H path is a blocking device transfer and does not mutate stream command-buffer state. The fallback command-buffer path does take
copy_stream->mutex while recording.
| } | ||
|
|
||
| iree_status_t status = iree_ok_status(); | ||
| if (!stream || stream == hipStreamLegacy || stream == hipStreamPerThread) { |
There was a problem hiding this comment.
Still trying to understand some of the per-thread stream stuff. Why do we need to synchronize contexts in the per-thread case here?
There was a problem hiding this comment.
Agreed, the previous context-wide sync was too broad. hipStreamPerThread should enqueue like an ordinary stream and not synchronize with unrelated streams. I narrowed the NULL/hipStreamLegacy path so it only synchronizes streams that participate in legacy default-stream ordering, excluding hipStreamNonBlocking; per-thread default streams remain blocking streams and only participate in the legacy interop case.
| if (pGraphNode) *pGraphNode = NULL; | ||
| HIP_RETURN_ERROR(hipErrorNotSupported); | ||
| IREE_TRACE_ZONE_BEGIN(z0); | ||
| if (!pGraphNode || !graph || !childGraph) { |
There was a problem hiding this comment.
Are there really no other checks on the graphs needed here? E.g., can childGraph = graph? Or graph is transitively a child graph of childGraph? I'm not sure what is allowed, or if that's all just up to callers to make sure they don't do some kind of unbounded nesting.
There was a problem hiding this comment.
Added validation in the common graph helper. It rejects childGraph == graph, rejects graphs from a different context, and walks nested child graph nodes to reject recursive containment
Retain graph event node state for record and wait nodes and instantiate event-aware graph exec barriers. Validate child graph nodes for context mismatch and recursive containment, while cleaning up retained child graph and event attrs on destroy and clone failure paths. Tighten capture bookkeeping, per-thread stream cleanup, and stream/copy locking around capture adoption and pageable transfer fallbacks.
Use a common thread-local token for capture ownership checks instead of pthread_self. Restore per-thread stream state to existing binding TLS storage and use IREE call-once/mutex primitives for HIP runtime initialization.
Prevent expensive walks where not needed.